Detecting Wikipedia Vandalism
نویسندگان
چکیده
Since its inception in 2001, Wikipedia has become the largest encyclopedia ever created in human history. With over 4 million articles in the English edition alone, it has become the highest-traffic educational website on the Internet. It receives over 100,000 edits per day, which can be daunting for human editors to monitor for vandalism, spam, or other inappropriate content. While there are existing vandalism reversion bots, they are generally hard-coded and may not be efficient enough at detecting vandalism. Types of vandalism include insertion of obscenities or personal attacks, deletion of valid content, and intentional introduction of incorrect facts (which can be difficult even for a human to detect). We will experiment with using machine learning techniques to create a vandalism detection bot. We will consider features such as character frequencies, word attributes, attributes of the comment associated with the revision, and the history and attributes of the editor. We will attempt to perform logistic regression and Naive Bayes on these features, and we will also consider training an SVM with them.
منابع مشابه
Using Language Models to Detect Wikipedia Vandalism
This paper explores a statistical language modeling approach for detecting Wikipedia vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism, defined as malicious editing intended to compromise the integrity of the content of articles. Ex...
متن کاملDetecting Vandalism on Wikipedia across Multiple Languages
Vandalism, the malicious modification or editing of articles, is a serious problem for free and open access online encyclopedias such as Wikipedia. Over the 13 year lifetime of Wikipedia, editors have identified and repaired vandalism in 1.6% of more than 500 million revisions of over 9 million English articles, but smaller manually inspected sets of revisions for research show vandalism may ap...
متن کاملDivide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism
The paper applies knowledge transfer methods to the problem of detecting Wikipedia vandalism detection, defined as malicious editing intended to compromise the integrity of the content of articles. A major challenge of detecting Wikipedia vandalism is the lack of a large amount of labeled training data. Knowledge transfer addresses this challenge by leveraging previously acquired knowledge from...
متن کاملDetecting Wikipedia Vandalism using WikiTrust
WikiTrust is a reputation system for Wikipedia authors and content. WikiTrust computes three main quantities: edit quality, author reputation, and content reputation. The edit quality measures how well each edit, that is, each change introduced in a revision, is preserved in subsequent revisions. Authors who perform good quality edits gain reputation, and text which is revised by several high-r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012